[None][fix] Fix ValueError and missing decoding statistics for MTP#12063
[None][fix] Fix ValueError and missing decoding statistics for MTP#12063cascade812 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
📝 WalkthroughWalkthroughTwo localized adjustments: (1) increased MTP slot pool size by one to accommodate a CUDA graph dummy request; (2) enhanced draft length calculation with fallback logic from num_nextn_predict_layers when max_draft_len is absent. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/bench/dataclasses/reporting.py`:
- Around line 781-784: The code treats falsy 0 as "missing" and only applies the
new dict-backed fallback, causing object-based configs to skip decoding stats;
update the logic around spec_config to (1) detect if spec_config is a dict and,
if so, use "if 'max_draft_len' in spec_config: return
spec_config['max_draft_len']" (allow 0), else if "num_nextn_predict_layers" in
spec_config return that value (again allowing 0), and (2) for non-dict (object)
configs, use attribute checks like "hasattr(spec_config, 'max_draft_len') and
spec_config.max_draft_len is not None" to return the explicit attribute
(including 0) or fall back similarly to num_nextn_predict_layers; in short, stop
using boolean "or" chains and use explicit key/attribute existence and "is not
None" checks for spec_config.max_draft_len and
spec_config.num_nextn_predict_layers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c3e187bc-6eb6-4266-854c-3fc44fdc9119
📒 Files selected for processing (2)
tensorrt_llm/_torch/speculative/mtp.pytensorrt_llm/bench/dataclasses/reporting.py
| draft_len = (spec_config.get("max_draft_len") | ||
| or spec_config.get("num_nextn_predict_layers")) | ||
| return draft_len or 0 | ||
| return spec_config.max_draft_len or 0 |
There was a problem hiding this comment.
Handle object configs and explicit zero values separately.
The new fallback only applies to dict-backed configs. Line 784 still reads spec_config.max_draft_len only, so object-based MTP configs can keep returning 0 and skip decoding stats. The or chain also treats an explicit max_draft_len=0 as “missing”, which changes the meaning of a configured value.
Proposed fix
if ("speculative_config" in self.kwargs
and self.kwargs["speculative_config"] is not None):
spec_config = self.kwargs["speculative_config"]
# Handle both dict (from YAML) and object types
if isinstance(spec_config, dict):
- draft_len = (spec_config.get("max_draft_len")
- or spec_config.get("num_nextn_predict_layers"))
- return draft_len or 0
- return spec_config.max_draft_len or 0
+ if "max_draft_len" in spec_config:
+ return spec_config["max_draft_len"] or 0
+ return spec_config.get("num_nextn_predict_layers") or 0
+ draft_len = getattr(spec_config, "max_draft_len", None)
+ if draft_len is not None:
+ return draft_len
+ return getattr(spec_config, "num_nextn_predict_layers", 0) or 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tensorrt_llm/bench/dataclasses/reporting.py` around lines 781 - 784, The code
treats falsy 0 as "missing" and only applies the new dict-backed fallback,
causing object-based configs to skip decoding stats; update the logic around
spec_config to (1) detect if spec_config is a dict and, if so, use "if
'max_draft_len' in spec_config: return spec_config['max_draft_len']" (allow 0),
else if "num_nextn_predict_layers" in spec_config return that value (again
allowing 0), and (2) for non-dict (object) configs, use attribute checks like
"hasattr(spec_config, 'max_draft_len') and spec_config.max_draft_len is not
None" to return the explicit attribute (including 0) or fall back similarly to
num_nextn_predict_layers; in short, stop using boolean "or" chains and use
explicit key/attribute existence and "is not None" checks for
spec_config.max_draft_len and spec_config.num_nextn_predict_layers.
zheyuf
left a comment
There was a problem hiding this comment.
Thanks. Looks good to me.
|
/bot run |
|
PR_Github #38389 [ run ] triggered by Bot. Commit: |
|
PR_Github #38389 [ run ] completed with state
|
Description
This PR addressed below two issues for MTP:
ValueError: No free slotswhen enableuse_relaxed_acceptance_for_thinkingoruse_sa_specfor MTP with a max_batch_size > 1.Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.Summary by CodeRabbit
Bug Fixes
Refactor